Improve failure mode, add multiple DCs #1273

as51340 · 2025-04-29T19:50:20Z

Release note

Documented types of failures tolerated with our current model of highly-available cluster. Documented possible architecture when multiple data centers are used.

Related product PRs

Checklist:

vercel · 2025-04-29T19:50:25Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
documentation	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	May 21, 2025 10:01am

gitbuda · 2025-04-30T09:48:08Z

pages/clustering/high-availability.mdx

+The architecture we currently use allows us to deploy coordinators in 3 data centers and hence tolerate a failure of the whole data center. Data instances can be freely
+distributed in any way you want between data centers.


I would extend this with some notes on the expected system requirements, e.g., the latency should be under N ms 🤔

I think it's not necessary. Failover will be slower but slower network IMO still doesn't disqualify the architecture.

antejavor

Just a small typo + rewording.

antejavor · 2025-05-20T11:19:22Z

pages/clustering/high-availability.mdx

+## Data center failure
+
+The architecture we currently use allows us to deploy coordinators in 3 data centers and hence tolerate a failure of the whole data center. Data instances can be freely
+distributed in any way you want between data centers. The failover time will be slighlty increased due to the network communication needed.
+


Suggested change

## Data center failure

The architecture we currently use allows us to deploy coordinators in 3 data centers and hence tolerate a failure of the whole data center. Data instances can be freely

distributed in any way you want between data centers. The failover time will be slighlty increased due to the network communication needed.

## Data center failure

The architecture we currently use allows us to deploy coordinators in 3 data centers and hence tolerate a failure of the whole data center. Data instances can be freely

distributed in any way you want between data centers. The failover time will be slightly increased due to the need for network communication.

antejavor

This also points to main branch.

as51340 · 2025-05-20T13:48:24Z

This also points to main branch.

Feel free to merge the suggestion. The pr should get into main because it's not connected to anything special in memgraph 3-3

antejavor · 2025-05-21T09:56:15Z

Cool @as51340, this is part of milestone 3.3, hence the comment.

Improve failure mode, add multiple DCs

fc68847

as51340 self-assigned this Apr 29, 2025

vercel bot deployed to Preview April 29, 2025 19:55 View deployment

Remove crash-stop, add omission faults

901fa95

vercel bot deployed to Preview April 29, 2025 20:40 View deployment

gitbuda reviewed Apr 30, 2025

View reviewed changes

Document slower failover time

953309d

as51340 added priority: low (improvements) An idea how the representation of knowledge on a certain page could be improved status: ready PR is ready for review labels May 2, 2025

as51340 added this to the Memgraph 3.3 milestone May 2, 2025

vercel bot deployed to Preview May 2, 2025 06:05 View deployment

antejavor approved these changes May 20, 2025

View reviewed changes

antejavor requested changes May 20, 2025

View reviewed changes

as51340 marked this pull request as ready for review May 20, 2025 13:48

as51340 requested a review from katarinasupe as a code owner May 20, 2025 13:48

Merge branch 'main' into improve-ha-docs

9dde05e

antejavor removed this from the Memgraph 3.3 milestone May 21, 2025

vercel bot deployed to Preview May 21, 2025 10:01 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve failure mode, add multiple DCs #1273

Improve failure mode, add multiple DCs #1273

Uh oh!

as51340 commented Apr 29, 2025 •

edited

Loading

Uh oh!

vercel bot commented Apr 29, 2025 •

edited

Loading

Uh oh!

gitbuda Apr 30, 2025

Uh oh!

as51340 Apr 30, 2025

Uh oh!

antejavor left a comment

Uh oh!

antejavor May 20, 2025

Uh oh!

antejavor left a comment

Uh oh!

as51340 commented May 20, 2025

Uh oh!

antejavor commented May 21, 2025

Uh oh!

Uh oh!

		The architecture we currently use allows us to deploy coordinators in 3 data centers and hence tolerate a failure of the whole data center. Data instances can be freely
		distributed in any way you want between data centers.

Improve failure mode, add multiple DCs #1273

Are you sure you want to change the base?

Improve failure mode, add multiple DCs #1273

Uh oh!

Conversation

as51340 commented Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Release note

Related product PRs

Checklist:

Uh oh!

vercel bot commented Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gitbuda Apr 30, 2025

Choose a reason for hiding this comment

Uh oh!

as51340 Apr 30, 2025

Choose a reason for hiding this comment

Uh oh!

antejavor left a comment

Choose a reason for hiding this comment

Uh oh!

antejavor May 20, 2025

Choose a reason for hiding this comment

Uh oh!

antejavor left a comment

Choose a reason for hiding this comment

Uh oh!

as51340 commented May 20, 2025

Uh oh!

antejavor commented May 21, 2025

Uh oh!

Uh oh!

as51340 commented Apr 29, 2025 •

edited

Loading

vercel bot commented Apr 29, 2025 •

edited

Loading